---
sidebar_position: 6
---

import Collapse from '@site/src/components/Collapse';

# ⁉️ Frequently Asked Questions

<Collapse title="How much VRAM a LLM model consumes?">

By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.

For ROCm the actual limits are currently largely untested, but the same CodeLlama-7B seems to use 8GB of VRAM as well on a AMD Radeon™ RX 7900 XTX according to the ROCm monitoring tools.

</Collapse>

<Collapse title="What GPUs are required for reduced-precision inference (e.g int8)?">

* int8: Compute Capability >= 7.0 or Compute Capability 6.1
* float16: Compute Capability >= 7.0
* bfloat16: Compute Capability >= 8.0

To determine the mapping between the GPU card type and its compute capability, please visit [this page](https://developer.nvidia.com/cuda-gpus)

</Collapse>

<Collapse title="How to utilize multiple NVIDIA GPUs?">

Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.

</Collapse>

<Collapse title="My AMD device isn't supported by ROCm">

You can use the HSA_OVERRIDE_GFX_VERSION variable if there is a similar GPU that is supported by ROCm you can set it to that.

For example for RDNA2 you can set it to 10.3.0 and to 11.0.0 for RDNA3.

</Collapse>

<Collapse title="How can I use my own model with Tabby?">

Please follow the [Tabby Model Specification](https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md) to create a directory with the specified files. You can then pass the directory path to `--model` or `--chat-model` to start Tabby.

</Collapse>

<Collapse title="Can I use local model with Tabby?">

Tabby also supports loading models from a local directory that follow our specifications as outlined in [MODEL_SPEC.md](https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md).

</Collapse>
